NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Linear convergence of Sinkhorn's algorithm for generalized static Schrödinger bridge

Choudhary, Rahul; Lyu, Hanbaek (July 2025, International Conference on Machine Learning)

The classical static Schrödinger Bridge (SSB) problem, which seeks the most likely stochastic evolution between two marginal probability measures, has been studied extensively in the optimal transport and statistical physics communities, and more recently in machine learning communities in the surge of generative models. The standard approach to solve SSB is to first identify its Kantorovich dual and use Sinkhorn's algorithm to find the optimal potential functions. While the original SSB is only a strictly convex minimization problem, this approach is known to warrant linear convergence under mild assumptions. In this work, we consider a generalized SSB allowing any strictly increasing divergence functional, far generalizing the entropy functional x log(x) in the standard SSB. This problem naturally arises in a wide range of seemingly unrelated problems in entropic optimal transport, random graphs/matrices, and combinatorics. We establish Kantorovich duality and linear convergence of Sinkhorn's algorithm for the generalized SSB problem under mild conditions. Our results provide a new rigorous foundation for understanding Sinkhorn-type iterative methods in the context of large-scale generalized Schrödinger bridges.
more » « less
Free, publicly-accessible full text available July 16, 2026
Block Majorization-Minimization with Diminishing Radius for Constrained Nonsmooth Nonconvex Optimization

https://doi.org/10.1137/23M1604515

Lyu, Hanbaek; Li, Yuchen (June 2025, SIAM Journal on Optimization)

Free, publicly-accessible full text available June 30, 2026
Sample Complexity of Branch-length Estimation by Maximum Likelihood

Clancy, David; Lyu, Hanbaek; Roch, Sebastien (July 2025, International Conference on Machine Learning)

We consider the branch-length estimation problem on a bifurcating tree: a character evolves along the edges of a binary tree according to a two-state symmetric Markov process, and we seek to recover the edge transition probabilities from repeated observations at the leaves. This problem arises in phylogenetics, and is related to latent tree graphical model inference. In general, the log-likelihood function is non-concave and may admit many critical points. Nevertheless, simple coordinate maximization has been known to perform well in practice, defying the complexity of the likelihood landscape. In this work, we provide the first theoretical guarantee as to why this might be the case. We show that deep inside the Kesten-Stigum reconstruction regime, provided with polynomially many m samples (assuming the tree is balanced), there exists a universal parameter regime (independent of the size of the tree) where the log-likelihood function is strongly concave and smooth with high probability. On this high-probability likelihood landscape event, we show that the standard coordinate maximization algorithm converges exponentially fast to the maximum likelihood estimator, which is within O(1/sqrt(m)) from the true parameter, provided a sufficiently close initial point.
more » « less
Free, publicly-accessible full text available July 16, 2026
Sample Complexity of Branch-length Estimation by Maximum Likelihood

Clancy, David Jr; Lyu, Hanbaek; Roch, Sebastien (July 2025, Forty-Second International Conference on Machine Learning (ICML) 2025)

Free, publicly-accessible full text available July 13, 2026
Diffusion-limited annihilating-coalescing systems

https://doi.org/10.1214/25-EJP1286

Ahn, Sungwon; Junge, Matthew; Lyu, Hanbaek; Reeves, Lily; Richey, Jacob; Sivakoff, David (February 2025, Electronic Journal of Probability)

Free, publicly-accessible full text available February 24, 2026
Scaling limit of soliton lengths in a multicolor box-ball system

https://doi.org/10.1017/fms.2024.74

Lewis, Joel; Lyu, Hanbaek; Pylyavskyy, Pavlo; Sen, Arnab (December 2024, Forum of Mathematics, Sigma)

Abstract The box-ball systems are integrable cellular automata whose long-time behavior is characterized by soliton solutions, with rich connections to other integrable systems such as the Korteweg-de Vries equation. In this paper, we consider a multicolor box-ball system with two types of random initial configurations and obtain sharp scaling limits of the soliton lengths as the system size tends to infinity. We obtain a sharp scaling limit of soliton lengths that turns out to be more delicate than that in the single color case established in [LLP20]. A large part of our analysis is devoted to studying the associated carrier process, which is a multidimensional Markov chain on the orthant, whose excursions and running maxima are closely related to soliton lengths. We establish the sharp scaling of its ruin probabilities, Skorokhod decomposition, strong law of large numbers and weak diffusive scaling limit to a semimartingale reflecting Brownian motion with explicit parameters. We also establish and utilize complementary descriptions of the soliton lengths and numbers in terms of modified Greene-Kleitman invariants for the box-ball systems and associated circular exclusion processes.
more » « less
Full Text Available
Phase transition in one-dimensional excitable media with variable interaction range

Aguirre, Ander; Lyu, Hanbaek; Sivakoff, David (August 2024, arXiv)

We investigate two discrete models of excitable media on a one-dimensional integer lattice ℤ: the κ-color Cyclic Cellular Automaton (CCA) and the κ-color Firefly Cellular Automaton (FCA). In both models, sites are assigned uniformly random colors from ℤ/κℤ. Neighboring sites with colors within a specified interaction range r tend to synchronize their colors upon a particular local event of 'excitation'. We establish that there are three phases of CCA/FCA on ℤ as we vary the interaction range r. First, if r is too small (undercoupled), there are too many non-interacting pairs of colors, and the whole graph ℤ will be partitioned into non-interacting intervals of sites with no excitation within each interval. If r is within a sweet spot (critical), then we show the system clusters into ever-growing monochromatic intervals. For the critical interaction range r=⌊κ/2⌋, we show the density of edges of differing colors at time t is Θ(t−1/2) and each site excites Θ(t1/2) times up to time t. Lastly, if r is too large (overcoupled), then neighboring sites can excite each other and such 'defects' will generate waves of excitation at a constant rate so that each site will get excited at least at a linear rate. For the special case of FCA with r=⌊2/κ⌋+1, we show that every site will become (κ+1)-periodic eventually.
more » « less
Full Text Available
Stochastic Optimization with Arbitrary Recurrent Data Sampling

Powell, William; Lyu, Hanbaek (July 2024, Proceedings of Machine Learning Research)

For obtaining optimal first-order convergence guarantees for stochastic optimization, it is necessary to use a recurrent data sampling algorithm that samples every data point with sufficient frequency. Most commonly used data sampling algorithms (e.g., i.i.d., MCMC, random reshuffling) are indeed recurrent under mild assumptions. In this work, we show that for a particular class of stochastic optimization algorithms, we do not need any further property (e.g., independence, exponential mixing, and reshuffling) beyond recurrence in data sampling to guarantee optimal rate of first-order convergence. Namely, using regularized versions of Minimization by Incremental Surrogate Optimization (MISO), we show that for non-convex and possibly non-smooth objective functions with constraints, the expected optimality gap converges at an optimal rate $$O(n^{-1/2})$$ under general recurrent sampling schemes. Furthermore, the implied constant depends explicitly on the ’speed of recurrence’, measured by the expected amount of time to visit a data point, either averaged (’target time’) or supremized (’hitting time’) over the starting locations. We discuss applications of our general framework to decentralized optimization and distributed non-negative matrix factorization.
more » « less
Full Text Available
Stochastic Optimization with Arbitrary Recurrent Data Sampling

Powell, William; Lyu, Hanbaek (July 2024, Proceedings of Machine Learning Research)

For obtaining optimal first-order convergence guarantees for stochastic optimization, it is necessary to use a recurrent data sampling algorithm that samples every data point with sufficient frequency. Most commonly used data sampling algorithms (e.g., i.i.d., MCMC, random reshuffling) are indeed recurrent under mild assumptions. In this work, we show that for a particular class of stochastic optimization algorithms, we do not need any further property (e.g., independence, exponential mixing, and reshuffling) beyond recurrence in data sampling to guarantee optimal rate of first-order convergence. Namely, using regularized versions of Minimization by Incremental Surrogate Optimization (MISO), we show that for non-convex and possibly non-smooth objective functions with constraints, the expected optimality gap converges at an optimal rate $$O(n^{-2})$$ under general recurrent sampling schemes. Furthermore, the implied constant depends explicitly on the ’speed of recurrence’, measured by the expected amount of time to visit a data point, either averaged (’target time’) or supremized (’hitting time’) over the target locations. We discuss applications of our general framework to decentralized optimization and distributed non-negative matrix factorization.
more » « less
Full Text Available
On The Complexity of First-Order Methods in Stochastic Bilevel Optimization

Kwon, Jeongyeol; Kwon, Dohyun; Lyu, Hanbaek (July 2024, Proceedings of Machine Learning Research)

We consider the problem of finding stationary points in Bilevel optimization when the lower-level problem is unconstrained and strongly convex. The problem has been extensively studied in recent years; the main technical challenge is to keep track of lower-level solutions $y^*(x)$ in response to the changes in the upper-level variables $$x$$. Subsequently, all existing approaches tie their analyses to a genie algorithm that knows lower-level solutions and, therefore, need not query any points far from them. We consider a dual question to such approaches: suppose we have an oracle, which we call $y^*$-aware, that returns an $$O(\epsilon)$$-estimate of the lower-level solution, in addition to first-order gradient estimators {\it locally unbiased} within the $$\Theta(\epsilon)$$-ball around $y^*(x)$. We study the complexity of finding stationary points with such an $y^*$-aware oracle: we propose a simple first-order method that converges to an $$\epsilon$$ stationary point using $$O(\epsilon^{-6}), O(\epsilon^{-4})$$ access to first-order $y^*$-aware oracles. Our upper bounds also apply to standard unbiased first-order oracles, improving the best-known complexity of first-order methods by $$O(\epsilon)$$ with minimal assumptions. We then provide the matching $$\Omega(\epsilon^{-6})$$, $$\Omega(\epsilon^{-4})$$ lower bounds without and with an additional smoothness assumption on $y^*$-aware oracles, respectively. Our results imply that any approach that simulates an algorithm with an $y^*$-aware oracle must suffer the same lower bounds.
more » « less
Full Text Available

« Prev Next »

Search for: All records